This section shows the main steps that have been applied to pre-process the raw data.
The CDOM spectra were modeled according to the information in Babin 2003.
acdom spectra were re-fitted using the complete data (i.e. between 350-500 nm).Average background values calculated between 683-687 nm.
Some files were in binary format, so I could not open them (ex.: C2001000.YSA).
Some spectra start at 300 nm while others at 350 nm.
Calculated the correlation between the measured and the fitted values.
Exported the complete spectra (350-700 nm): both the raw and the modeled data.
There were negative values in the irradiance data (Ed, Eu, Kd, Ku). I have cleaned the data by setting these negative values to NA.
NA.DOC, AQY) from Massimo 2000.Just some graphs to visualize the data. Note that the same color palette will be used to represent the areas in all graphics.
There is a total of 424 different stations that were sampled during the COASTLOOC expeditions.
Just an overview of the available variables (excluding radiometric measurements).
Overview of the averaged absorption spectra for each area. The acdom spectra have been refitted from the original/raw spectra corrected with the new 746-750 nm background.
Comparing acdom443 for the different areas shows that there is a clear open to coastal gradient.
We can see that the DOC follows the same pattern as acdom443.
We can also use scatter-plots to further explore the relationships among variables.
Relationships between some pigments.
There are two stations without geographical coordinates: C2001000, C2002000.
Find a good way to flag the data:
a_phy, a_nap, a_tot spectra contain negative absorption values, but this does not mean they are bad spectra.There are a lot of nutrient parameters that have values of zero. Are they true zero or indicate missing values?
In Babin 2003, it is said:
A baseline correction was applied by subtracting the absorbance value averaged over a 5-nm interval around 685 nm from all the spectral values.
In the data, there is a variable called y_model_intercept. Is this variable really a value derived from a model? I think it is more the average value calculated between 683-687 nm. If I am right, y_model_intercept should be renamed to background_a_cdom_average_683_687.
In Babin 2003, it is said:
…from all the measured spectral values of ap(l) and aNAP(l), respectively (to be exact, the averages of the measured values between 746 and 750 nm were subtracted).
However, I was told that the background values were calculated between 745-750 nm. I calculated new background values between 746-750 nm. If I compare both background values, they fit perfectly on the 1:1 line.
acdom data (ex.: C5006066) that have no entries in SurfaceData5(C4corr).txt. That means that we do not have coordinates nor area for stations like C5006066. The next list shows all the acdom stations without metadata.## # A tibble: 14 x 1
## station
## <chr>
## 1 C5006066
## 2 C5007015
## 3 C5008009
## 4 C5009015
## 5 C5015012
## 6 C5030023
## 7 C5033015
## 8 C5034013
## 9 C5035012
## 10 C5036013
## 11 C5037013
## 12 C5049017
## 13 C5050020
## 14 C5053025
My understanding is that background variables in the original data should be renamed as follow (validate if 745-750 nm or 746-750 nm):
background_a_cdom_average_683_687 = y_model_interceptbackground_a_phy_average_745_750 = back_pgabackground_a_nap_average_745_750 = back_dtabackground_a_tot_average_745_750 = back_toaEu means that Eu0- was estimated (see with Marcel, I do not remember what it means).
Ed is Ed0- calculated from 0.96 x ed0+.
ed to ed0-.Do we simply set negative vlaues to NA or completly remove the spectral profile? For example, we can look at the ed values for station A2008000.
| station | wavelength | eu | ed | ku | kd |
|---|---|---|---|---|---|
| A2008000 | 411 | 2.500 | 15.693 | -95.000 | 0.095 |
| A2008000 | 443 | 2.800 | 24.754 | -95.000 | 0.083 |
| A2008000 | 456 | 3.300 | 30.747 | -95.000 | 0.077 |
| A2008000 | 490 | 3.200 | 26.328 | -95.000 | 0.070 |
| A2008000 | 509 | NA | 25.887 | -95.000 | 0.082 |
| A2008000 | 532 | 2.300 | 21.366 | -95.000 | 0.091 |
| A2008000 | 559 | 1.600 | 19.003 | -95.000 | 0.106 |
| A2008000 | 619 | 0.260 | 13.520 | 0.376 | 0.324 |
| A2008000 | 665 | 0.154 | 16.091 | 0.301 | 0.445 |
| A2008000 | 683 | 0.287 | 13.657 | 0.205 | 0.494 |
| A2008000 | 705 | 0.051 | 7.725 | 0.225 | 0.654 |
| A2008000 | 779 | NA | 3.911 | -95.000 | NA |
| A2008000 | 866 | NA | -2.781 | -95.000 | NA |
a(715) is always at 0.
There are negative values.
The data is a mix of temporal and spatial observations, so how should we present the data?
By area?